Process Mining: On the Balance Between Underfitting and Overfitting

نویسنده

  • W.M.P. van der Aalst
چکیده

Process mining techniques attempt to extract non-trivial and useful information from event logs. One aspect of process mining is control-flow discovery, i.e., automatically constructing a process model (e.g., a Petri net) describing the causal dependencies between activities. One of the essential problems in process mining is that one cannot assume to have seen all possible behavior. At best, one has seen a representative subset. Therefore, classical synthesis techniques are not suitable as they aim at finding a model that is able to exactly reproduce the log. Existing process mining techniques try to avoid such “overfitting” by generalizing the model to allow for more behavior. This generalization is often driven by the representation language and very crude assumptions about completeness. As a result, parts of the model are “overfitting” (allow only what has actually been observed) while other parts may be “underfitting” (allow for much more behavior without strong support for it). This talk will present the main challenges posed by real-life applications of process mining and show that it is possible to balance between overfitting and underfitting in a controlled manner.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simplifying Mined Process Models: An Approach Based on Unfoldings

Process models discovered using process mining tend to be complex and have problems balancing between overfitting and underfitting. Overfitting models are not general enough while underfitting models allow for too much behavior. This paper presents a post-processing approach to simplify discovered process models while controlling the balance between overfitting and underfitting. The discovered ...

متن کامل

Simplifying discovered process models in a controlled manner

Process models discovered from a process log using process mining tend to be complex and have problems balancing between overfitting and underfitting. An overfitting model allows for too little behavior as it just permits the traces in the log and no other trace. An underfitting model allows for too much behavior as it permits traces that are significantly different from the behavior seen in th...

متن کامل

Time prediction based on process mining

Process mining allows for the automated discovery of process models from event logs. These models provide insights and enable various types of model-based analysis. This paper demonstrates that the discovered process models can be extended with information to predict the completion time of running instances. There are many scenarios where it is useful to have reliable time predictions. For exam...

متن کامل

J-measure Based Hybrid Pruning for Complexity Reduction in Classification Rules

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other...

متن کامل

Improving Process Model Precision by Loop Unrolling

Despite the advent of scalable process mining techniques that can handle both noisy and incomplete real-life event logs, there is a lack of scalable algorithms capable of handling a common cause of model underfitting: when the same activity in the log in fact behaves differently depending on the number of occurrences in a particular trace. This paper proposes a simple scalable technique to iden...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008